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Abstract. Regular string-to-string functions enjoy a nice triple charac¬ 
terization through deterministic two-way transducers (2DFT), streaming 
string transducers (SST) and MSO definable functions. This result has 
recently been lifted to FO definable functions, with equivalent repre¬ 
sentations by means of aperiodic 2DFT and aperiodic 1-bounded SST, 
extending a well-known result on regular languages. In this paper, we 
give three direct transformations: i) from 1-bounded SST to 2DFT, ii ) 
from 2DFT to copyless SST, and Hi) from fc-bounded to 1-bounded SST. 

We give the complexity of each construction and also prove that they pre¬ 
serve the aperiodicity of transducers. As corollaries, we obtain that FO 
definable string-to-string functions are equivalent to SST whose transi¬ 
tion monoid is finite and aperiodic, and to aperiodic copyless SST. 

1 Introduction 

The theory of regular languages constitutes a cornerstone in theoretical computer 
science. Initially studied on languages of finite words, it has since been extended 
in numerous directions, including finite and infinite trees. Another natural exten¬ 
sion is moving from languages to transductions. We are interested in this work in 
string-to-string transductions, and more precisely in string-to-string functions. 
One of the strengths of the class of regular languages is their equivalent presen¬ 
tation by means of automata, logic, algebra and regular expressions. The class of 
so-called regular string functions enjoys a similar multiple presentation. It can 
indeed be alternatively defined using deterministic two-way finite state trans¬ 
ducers (2DFT), using Monadic Second-Order graph transductions interpreted 
on strings (MSOT) 0, and using the model of streaming string transducers 
(sst) m. More precisely, regular string functions are equivalent to different 
classes of SST, namely copyless SST .1] and fc-bounded SST, for every positive 
integer k 0. Different papers mm have proposed transformations between 
2DFT, MSOT and SST, summarized on Figure [H 

The connection between automata and logic, which has been very fruitful for 
model-checking for instance, also needs to be investigated in the framework of 
transductions. As it has been done for regular languages, an important objective 
is then to provide similar logic-automata connections for subclasses of regular 
functions, providing decidability results for these subclasses. As an illustration, 
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Fig. 1: Summary of transformations between equivalent models, k-b. stands for k- 
bounded. Plain (resp. dotted) arrows concern regular models (resp. bracketed models). 
Original constructions presented in this paper are depicted by thick dashed arrows and 
are valid for both regular and aperiodic versions of the models. 

the class of rational functions (accepted by one-way finite state transducers) 
owns a simple characterization in terms of logic, as shown in [9j. The corre¬ 
sponding logical fragment is called order-preserving MSOT. The decidability of 
the one-way definability of a two-way transducer proved in m thus yields the 
decidability of this fragment inside the class of MSOT. 

The first-order logic considered with order predicate constitutes an important 
fragment of the monadic second order logic. It is well known that languages de¬ 
finable using this logic are equivalent to those recognized by finite state automata 
whose transition monoid is aperiodic (as well as other models such as star-free 
regular expressions). These positive results have motivated the study of simi¬ 
lar connections between first-order definable string transformations (FOT) and 
restrictions of state-based transducers models. Two recent works provide such 
characterizations for 1-bounded SST and 2DFT respectively EH- To this end, 
the authors study a notion of transition monoid for these transducers, and prove 
that FOT is expressively equivalent to transducers whose transition monoid is 
aperiodic by providing back and forth transformations between FOT and 1- 
bounded aperiodic SST (resp. aperiodic 2DFT). In particular, El lets as an 
open problem whether FOT is also equivalent to aperiodic copyless SST and to 
aperiodic fc-bounded SST, for every positive integer k. It is also worth noticing 
that these characterizations of FOT, unlike the case of languages, do not allow 
to decide the class FOT inside the class MSOT. Indeed, while decidability for 
languages relies on the syntactic congruence of the language, no such canonical 
object exists for the class of regular string transductions. 

In this work, we aim at improving our understanding of the relationships 
between 2DFT and SST. We first provide an original transformation from 1- 
bounded (or copyless) SST to 2DFT, and study its complexity. While the existing 
construction used MSO transformations as an intermediate formalism, resulting 
in a non-elementary complexity, our construction is in double exponential time, 








and in single exponential time if the input SST is copyless. Conversely, we de¬ 
scribe a direct construction from 2DFT to copyless SST, which is similar to that 
of [1], but avoids the use of an intermediate model. These constructions also al¬ 
low to establish links between the crossing degree of a 2DFT, and the number of 
variables of an equivalent copyless (resp. 1-bounded) SST, and conversely. Last, 
we provide a direct construction from /c-bounded SST to 1-bounded SST, while 
the existing one was using copyless SST as a target model and not 1-bounded 
SST [3b These constructions are represented by thick dashed arrows on Figure[0 

In order to lift these constructions to aperiodic transducers, we introduce 
a new transition monoid for SST, which is intuitively more precise than the 
existing one (more formally, the existing one divides the one we introduce). We 
use this new monoid to prove that the three constructions we have considered 
above preserve the aperiodicity of the transducer. As a corollary, this implies 
that FOT is equivalent to both aperiodic copyless and fc-bounded SST, for every 
integer k, two results that were stated as conjectures in HH (see Figure [TJ. 

2 Definitions 

2.1 Words, Languages and Transducers 

Given a finite alphabet A, we denote by A* the set of finite words over A, and 
by e the empty word. The length of a word it € A* is its number of symbols, 
denoted by |it|. For all i G {1,..., |u|}, we denote by u[i\ the i-th letter of u. 

A language over A is a set L C A*. Given two alphabets A and B 1 a trans¬ 
duction from A to B is a relation R C A* x B*. A transduction R is functional 
if it is a function. The transducers we will introduce will define transductions. 
We will say that two transducers T, T' are equivalent whenever they define the 
same transduction. 

Automata A deterministic two-way finite state automaton (2DFA) over a finite 
alphabet A is a tuple A = (Q , qo,F, S ) where Q is a finite set of states, qo G Q is 
the initial state, F C Q is a set of final states, and S is the transition function, 
of type S : Q x (A W {b, H}) —>• Q x {+1, 0, —1}. The new symbols b and H are 
called endmarkers. 

An input word u is given enriched by the endmarkers, meaning that A reads 
the input hiH. We set u[0] =h and u[|u| + 1] =H. Initially the head of A is 
on the first cell h in state qo (the cell at position 0). When A reads an input 
symbol, depending on the transitions in A, its head moves to the left (—1), or 
stays at the same position (0), or moves to the right (+1). To ensure the fact 
that the reading of A does not go out of bounds, we assume that there is no 
transition moving to the left (resp. to the right) on input symbol b (resp. H). A 
stops as soon as it reaches the endmarker H in a final state. 

A configuration of A is a pair (q,i) S QxN where q is a state and i is a position 
on the input tape. A run p of A is a finite sequence of configurations. The run 
p = (pi, i\)... ( Pm, im ) is a run on an input word u € A* of length n if i m ^ n+1, 
and for all k € {1,... ,m — 1}, 0 < i k < n + 1 and (p fc , u[i k \,Pk+iAk+i -4) £ A. 



Fig. 2: Aperiodic 2DFT (left) and SST (right) realizing the function /. 

It is accepting if pi = qo, i\ = 0, and m is the only index where both i m = n + 1 
and p m £ F. The language of a 2DFA A, denoted by L(A), is the set of words 
u such that there exists an accepting run of A on u. 

Transducers Deterministic two-way finite state transducers (2DFT) over A 
extend 2DFA with a one-way left-to-right output tape. They are defined as 2DFA 
except that the transition relation <5 is extended with outputs: 6 : Q x (A l±l {b, H 
}) —>■ B* x Q x {—1,0, +1}. When a transition (q , a, v, q ', to) is fired, the word v 
is appended to the right of the output tape. 

A run of a 2DFT is a run of its underlying automaton, i.e. the 2DFA obtained 
by ignoring the output (called its underlying input automaton). A run p may be 
simultaneously a run on a word u and on a word u' u. However, when the 
input word is given, there is a unique sequence of transitions associated with p. 
Given a 2DFT T, an input word u £ A* and a run p = (pi, if)... ( p m , i m ) of T 
on u, the output of p on u is the word obtained by concatenating the outputs of 
the transitions followed by p. If p contains a single configuration, this output is 
simply e. The transduction defined by T is the relation R(T) defined as the set 
of pairs ( u , v) £ A* x B* such that v is the output of an accepting run p on the 
word u. As T is deterministic, such a run is unique, thus R(T) is a function. 
Streaming String Transducers Let A be a finite set of variables denoted by 
X, Y,... and B be a finite alphabet. A substitution a is defined as a mapping 
a : A —)• (HU A)*. Let Sx.b be the set of all substitutions. Any substitution 
a can be extended to <x : {B U A)* —> (B U A)* in a straightforward manner. 
The composition o \02 of two substitutions o\ and 02 is defined as the standard 
function composition a\ 02 , he. cfiCT 2 (A) = cri(cr 2 (A)) for all X £ A. We say 
that a string u £ (B U A)* is k-linear if each X £ X occurs at most k times in 
u. A substitution a is fc-linear if cr(X) is k-linear for all X. It is copyless if for 
any variable X , there exists at most one variable Y such that X occurs in a(Y), 
and X occurs at most once in cr(Y). 

A streaming string transducer (SST) is a tuple T = (A, B, Q, qo, Qf, 5, A, p, F ) 
where ( Q,qo,Qf,5) is a one-way automaton, A and B are finite sets of input 
and output alphabets respectively, A is a finite set of variables, p : 6 — » Sx,b is 
a variable update and FiQj-^fAU B)* is the output function. 

Example 1. As an example, let / : {a, b}* —» (a, b}* be the function mapping any 
word u = a k °ba kl ■ ■ ■ ba kn to the word f(u) = a k °b k °a kl b kl ■ ■ ■ a kn b kn obtained 
by adding after each block of consecutive a a block of consecutive b of the same 
length. Since each word u over A can be uniquely written u = a k °ba kl ■ ■ ■ ba kn 




with some Ay being possibly equal to 0, the function / is well defined. We give 
in Figure [2] a 2DFT and an SST that realize /. 

The concept of a run of an SST is defined in an analogous manner to that of 
a finite state automaton. The sequence (0y,i)o<i<|r| of substitutions induced by 
a run r = qo —A q\ qi... q n -i —^ q n is defined inductively as the following: 
o'r,i = o'r,i-ip(<li-i,(ii) for 1 < i ^ |r| and oyp = p(qo,ai). We denote oy | r | by oy 
and say that ay is induced by r. 

If r is accepting, i.e. q n G Qf, we can extend the output function F to r by 
F(r) = a e a r F(q n ), where ay substitutes all variables by their initial value e. For 
all words u £ A* , the output of u by T is defined only if there exists an accepting 
run r of T on u, and in that case the output is denoted by T(u) = F(r). The 
transformation R(T) is then defined as the set of pairs ( u,T(u )) € A* x B*. 

An SST T is copyless if for every transition t G 8, the variable update p(t) 
is copyless. Given an integer k € N>o, we say that T is k-bounded if all its runs 
induce fc-linear substitutions. It is bounded if it is fc-bounded for some k. 

The following theorem gives the expressiveness equivalence of the models we 
consider. We do not give the definitions of MSO graph transductions as our 
results will only involve state-based transducers (see J5] for more details). 

Theorem 1 (0113]). Let f : A* —» B* be a function over words. Then the 
following conditions are equivalent: 

— f is realized by an MSO graph transduction, 

— f is realized by a 2DFT, 

— f is realized by a copyless SST, 

— / is realized by a bounded SST. 

2.2 Transition monoid of transducers 

A (finite) monoid M is a (finite) set equipped with an associative internal law -m 
having a neutral element for this law. A morphism rj : M ^ N between monoids 
is an application from M to N that preserves the internal laws, meaning that 
for all x and y in M, rj(x -m y) = v{x) ■n V (y). When the context is clear, we 
will write xy instead of x -m y- A monoid M divides a monoid N if there exists 
an onto morphism from a submonoid of N to M. A monoid M is said to be 
aperiodic if there exists a least integer n, called the aperiodicity index of M, 
such that for all elements x of M, we have x n = x n+1 . 

Given an alphabet A , the set of words A* is a monoid equipped with the 
concatenation law, having the empty word as neutral element. It is called the 
free monoid on A. A finite monoid M recognizes a language L of A* if there exists 
an onto morphism 77 : A* —>■ M such that L = rj -1 (rj(L)). It is well-known that 
the languages recognized by finite monoids are exactly the regular languages. 

The monoid we construct from a machine is called its transition monoid. We 
are interested here in aperiodic machines, in the sense that a machine is aperiodic 
if its transition monoid is aperiodic. We now give the definition of the transition 
monoid for a 2DFT and an SST. 
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Deterministic Two-Way Finite State Transducers As 

in the case of automata, the transition monoid of a 2DFT T is 
the set of all possible behaviors of T on a word. The following 
definition comes from [4], using ideas from [14] amongst others. 

As a word can be read in both ways, the possible runs are 
split into four relations over the set of states Q of T. Given an 
input word it, we define the left-to-left behavior bh u(u) as the set of pairs ( p , q) 
of states of T such that there exists a run over u starting on the first letter of it in 
state p and exiting it on the left in state q (see Figure on the right). We define in 
an analogous fashion the left-to-right, right-to-left and right-to-riglit behaviors 
denoted respectively bh^ r (it), bh r £ (it) and bh rr (u). Then the transition monoid 
of a 2DFT is defined as follows: 

Let T = ( Q,A,S,qo,F ) be a 2DFT. The transition monoid of T is A*/~ T 
where ~t is the conjunction of the four relations and ~ rT . defined for 

any words u, u' of A* as follows: it ~ xy u! iff bh xy (it) = bh^(it'), for x, y £ {£, r}. 
The neutral element of this monoid is the class of the empty word e, whose 
behaviors bh xy (e) is the identity function if a; ^ y , and is the empty relation 
otherwise. 

Note that since the set of states of T is finite, each behavior relation is of 
finite index and consequently the transition monoid of T is also finite. Let us 
also remark that the transition monoid of T does not depend on the output and 
is in fact the transition monoid of the underlying 2DFA. 

Streaming String Transducers A notion of transition monoid for SST was 
defined in HU- We give here its formal definition and refer to El for advanced 
considerations. In order to describe the behaviors of an SST, this monoid de¬ 
scribes the possible flows of variables along a run. Since we give later an alter¬ 
native definition of transition monoid for SST, we will call it the flow transition 
monoid (FTM). 

Let T be an SST with states Q and variables X . The flow transition monoid 
Mt of T is a set of square matrices over the integers enriched with a new ab¬ 
sorbent element A. The matrices are indexed by elements of Q x X. Given an 
input word u, the image of u in Mt is the matrix m such that for all states p , q 
and all variables X,Y, m\p,X][q,Y\ = n € N (resp. m\p,X][q,Y] = A) if, and 
only if, there exists a run r of T over u from state p to state q , and X occurs n 
times in a r (Y) (resp. iff there is no run of T over u from state p to state q). 

Note that if T is ^-bounded, then for all word w, all the coefficients of its 
image in Mt are bounded by k. The converse also holds. Then Mt is finite if, 
and only if, T is ^-bounded, for some k. 

It can be checked that the machines given in Example |T| are aperiodic. The¬ 
orem [1] extends to aperiodic subclasses and to first-order logic, as in the case 
of regular languages w. These results as well as our contributions to these 
models are summed up in Figure [0 

Theorem 2 ( |11I[4] I. Let / : A* B* be a function over words. Then the 
following conditions are equivalent: 

— f is realized by a FO graph transduction, 








— / is realized by an aperiodic 2DFT, 

— / is realized by an aperiodic 1-bounded SST. 

3 Substitution Transition Monoid 

In this section, we give an alternative take on the definition of the transition 
monoid of an SST, and show that both notions coincide on aperiodicity and 
boundedness. The intuition for this monoid, that we call the substitution tran¬ 
sition monoid , is for the elements to take into account not only the multiplicity 
of the output of each variable in a given run, but also the order in which they 
appear in the output. It can be seen as an enrichment of the classic view of 
transition monoids as the set of functions over states equipped with the law of 
composition. Given a substitution a G Sx,b , let us denote d the projection of 
a on the set X, i.e. we forget the parts from B. The substitutions a are horno- 
morphisms of X* which form an (infinite) monoid. Note that in the case of a 
1-bounded SST, each variable occurs at most once in d(Y). 

Substitution Transition Monoid of an SST. Let T be an SST with states Q and 
variables X. The substitution transition monoid (STM) of T, denoted My, is a 
set of partial functions / : Q —^ Q x Sx$- Given an input word u, the image 
of u in My is the function f u such that for all states p, f u (p) = {q, d r ) if, and 
only if, there exists a run r of T over u from state p to state q that induces 
the substitution d r . This set forms a monoid when equipped with the following 
composition law: Given two functions /„,/„ G My, the function f uv is defined 
by fuv(q) = {q", o o o') whenever f u (q) = (.q ', d) and f v {q') = (q", d'). 

We now make a few remarks about this monoid. Let us first observe that 
the FTM of T can be recovered from its STM. Indeed, the matrix m associated 
with a word u in Mt is easily deduced from the function f u in My. This obser¬ 
vation induces an onto morphism from My to Mt , and consequently the FTM 
of an SST divides its STM. This proves that if the STM is aperiodic, then so is 
the FTM since aperiodicity is preserved by division of monoids. Similarly, copy¬ 
less and fc-bounded SST (given k G N>o) are characterized by means of their 
STM. This transition monoid can be separated into two main components: the 
first one being the transition monoid of the underlying deterministic one-way 
automaton, which can be seen as a set of functions Q —>• Q, while the second 
one is the monoid Sx of homomorphisms on X, equipped with the composi¬ 
tion. The aware reader could notice that the STM can be written as the wreath 
product of the transformation semigroup (X*,Sx) by {Q,Q q ). However, as the 
monoid of substitution is obtained through the closure under composition of the 
homomorphisms of a given SST, it may be infinite. 

The next theorem proves that aperiodicity for both notions coincide, since 
the converse comes from the division of STM by FTM. 

Theorem 3. Let T be a k-bounded SST with £ variables. If its FTM is aperiodic 
with aperiodicity index n then its STM is aperiodic with aperiodicity index at 
most n + (k + 1)£. 




Fig. 3: The output structure of a partial run of an SST used in the proof of 
Theorem [I] 

4 From 1-bounded SST to 2DFT 

The existing transformation of a 1-bounded (or even copyless) SST into an equiv¬ 
alent 2DFT goes through MSO transductions, yielding a non-elementary com¬ 
plexity. We present here an original construction whose complexity is elementary. 

Theorem 4. Let T be a 1-bounded SST with n states and m variables. Then we 
can effectively construct a deterministic 2-way transducer that realizes the same 
function. If T is 1-bounded (resp. copyless), then the 2DFT has 0(m2 m2 n n ) 
states (resp. 0(mn n )). 

Proof. We define the 2DFT as the composition of a left-to-right sequential trans¬ 
ducer, a right-to-left sequential transducer and a 2-way transducer. Remark that 
this proves the result as two-way transducers are closed under composition with 
sequential ones 0. 

The left-to-right sequential transducer does a single pass on the input word 
and outputs the same word enriched with the transition used by the SST in the 
previous step. The right-to-left transducer uses this information to enrich each 
position of the input word with the set of useful variables, i.e the variables that 
flow to an output variable according to the partial run on the suffix read. The 
two sequential transducers are quite standard. They realize length-preserving 
functions that simply enrich the input word with new information. The last 
transducer is more interesting: it uses the enriched information to follow the 
output structure of T. The output structure of a run is a labeled and directed 
graph such that, for each variable A' useful at a position j, we have two nodes 
Xl and X) linked by a path whose concatenated labels form the value stored in 
A' at position j of the run (see El and Figure [3]). 

The transition function of the two-way transducer is described in Figure |H 
It first reaches the end of the word and picks the first variable to output. It then 
rewinds the run using the information stored by the first sequential transducer, 
producing the said variable using the local update function. When it has finished 
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Fig. 4: The third transducer follows the output structure. States indexed by i 
correspond to the beginning of a variable, while states indexed by o correspond 
the end. a (resp. er') stand for the substitution at position a (resp. a'). 

to compute and produce a variable X, it switches to the following one using the 
information of the second transducer to know which variable Y X is flowing 
to, and starts producing it. Note that such a Y is unique thanks to the 1- 
boundedness property. If T is copyless, then this information is local and the 
second transducer can be bypassed. 

Regarding complexity, a careful analysis of the composition of a one-way 
transducer of size n with a two-way transducer of size m from |6l4| shows that 
this can be done by a two-way transducer of size 0(mn n ). Then given a 1- 
bounded SST with n states and m variables, we can construct a deterministic 
two-way transducer of size 0(m2 m 2 n"). If T is copyless, the second sequential 
transducer is omitted, resulting in a size of 0(mn n ). 


Theorem 5. Let T be an aperiodic 1-bounded SST. Then the equivalent 2DFT 
constructed using Theorem Is also aperiodic. 

Proof. The aperiodicity of the three transducers gives the result as aperiodicity is 
preserved by composition of a one-way by a two-way [4] . The aperiodicity of the 
two sequential transducers is straightforward since their runs depend respectively 
on the underlying automaton and the update function. The aperiodicity of the 
2DFT comes from the fact that since it follows the output structure of the 
SST, its partial runs are induced by the flow of variables and their order in the 
substitutions, which is an information contained in the FTM and thus aperiodic 
thanks to Theorem [3] 

5 From 2DFT to copyless SST 

In pQ, the authors give a procedure to construct a copyless SST from a 2DFT. 
This procedure uses the intermediate model of heap based transducers. We give 
here a direct construction with similar complexity. This simplified presentation 
allows us tu prove that the construction preserves the aperiodicity. 

Theorem 6. Let T be a 2DFT with n states. Then we can effectively construct 
a copyless SST with 0((2n) 2n ) states and 2n — 1 variables that computes the 
same function. 







Fig. 5: Left: The state of the 
SST is represented in black. 
The red part corresponds to 
the local transitions of the 
2DFT. Right: After reading 
a, we reduce the new forest 
by eliminating the useless 
branches and shortening the 
unlabeled linear paths. 


Proof. (Sketch of) The main idea is for the constructed SST to keep track of 
the right-to-right behavior of the prefix read until the current position, similarly 
to the construction of Shepherdson El- This information can be updated upon 
reading a new letter, constructing a one-way machine recognizing the same input 
language. The idea from [3] is to have one variable per possible right-to-right 
run, which is bounded by the number of states. However, since two right-to- 
right runs from different starting states can merge, this construction results in a 
1-bounded SST. To obtain copylessness, we keep track of these merges and the 
order in which they appear. Different variables are used to store the production 
of each run before the merge, and one more variable stores the production after. 

The states of the copyless SST are represented by sets of labeled trees hav¬ 
ing the states of the input 2DFT as leaves. Each inner vertex represents one 
merging, and two leaves have a common ancestor if the right-to-right runs from 
the corresponding states merge at some point. Each tree then models a set of 
right-to-right runs that all end in a same state. Note that it is necessary to also 
store the end state of these runs. For each vertex, we use one variable to store 
the production of the partial run corresponding to the outgoing edge. 

Given such a state and an input letter, the transition function can be defined 
by adding to the set of trees the local transitions at the given letter, and then 
reducing the resulting graph in a proper way (see Figure [5]). 

Finally, as merges occur upon two disjoint sets of states of the 2DFT (initially 
singletons), the number of merges, and consequently the number of inner vertices 
of our states, is bounded by n — 1. Therefore, an input 2DFT with n states can 
be realized by an SST having 2n — 1 variables. Finally, as states are labeled 
graphs, Cayley’s formula yields an exponential bound on the number of states. 

Moreover, this construction preserves aperiodicity: 

Theorem 7. Let T be an aperiodic 2DFT. Then the equivalent SST constructed 
using Theorem [3| is also aperiodic. 

Proof. If the input 2DFT is aperiodic of index n , then for any word w, w n and 
w n+1 mer g e the same partial runs for the four kinds of behaviors, by definition, 
and in fact the merges appear in the same order. As explained earlier, the state 
qi (resp. q 2 ) reached by the constructed SST over the inputs uw n (resp. uw n+1 ) 
represents the merges of the right-to-right runs of T over uw n (resp. uw n+1 ). 














Since these runs can be decomposed in right-to right runs over u and partial runs 
over w n and w n+1 , the merge equivalence between w n and w n+l implies that 
qi = < 72 - Moreover, since variables are linked to these merges, the aperiodicity of 
the merge equivalence implies the aperiodicity of both the underlying automaton 
and the substitution function of the SST, concluding the proof. 

As a corollary, we obtain that the class of aperiodic copyless SST is expres¬ 
sively equivalent to first-order definable string-to-string transductions. 

Corollary 1. Let f : A* —» B* be a function over words. Then f is realized by 
a FO graph transduction iff it is realized by an aperiodic copyless SST. 

6 From fc-bounded to 1-bounded SST 

The existing construction from fc-bounded to 1-bounded, presented in [3], builds 
a copyless SST. We present an alternative construction that, given a fc-bounded 
SST, directly builds an equivalent 1-bounded SST. We will prove that this con¬ 
struction preserves aperiodicity. 

Theorem 8. Given a k-bounded SST T with n states and m variables, we can 
effectively construct an equivalent 1-bounded SST. This new SST has n2 N states 
andmkN variables, where N = 0(n n (k + l) nm ) is the size of the flow transition 
monoid Mt- 

Proof. In order to move from a fc-bounded SST to a 1-bounded SST, the natural 
idea is to use copies of each variable. However, we cannot maintain k copies of 
each variable all the time: suppose that X flows into Y and Z, which both occur 
in the final output. If we have k copies of X , we cannot produce in a 1-bounded 
way (and we do not need to) k copies of Y and k copies of Z. 

Now, if we have access to a look-ahead information, we can guess how many 
copies of each variable are needed, and we can easily construct a copyless SST 
by using exactly the right number of copies for each variable and at each step. 
The construction relies on this observation. We simulate a look-ahead through a 
subset construction, having copies of each variable for each possible behavior of 
the suffix. Then given a variable and the behavior of a suffix, we can maintain 
the exact number of variables needed and perform a copyless substitution to 
a potential suffix for the next step. However, since the SST is not necessarily 
co-deterministic, a given suffix can have multiple successors, and the result is 
that its variables flow to variables of different suffixes. As variables of different 
suffixes are never recombined, we obtain a 1-bounded SST. 

Theorem 9. Let T be an aperiodic k-bounded SST. Then the equivalent 1- 
bounded SST constructed using Theorem [3 is also aperiodic. 

As a corollary, we obtain that for the class of aperiodic bounded SST is 
expressively equivalent to first-order definable string-to-string transductions. 

Corollary 2. Let f : A* —>• B* be a function over words. Then f is realized by a 
FO graph transduction iff it is realized by an aperiodic bounded SST (k £ N>o )■ 


7 Perspectives 


There is still one model equivalent to the generic machines whose aperiodic sub¬ 
class elude our scope yet, namely the functional two-way transducers, which cor¬ 
respond to non-deterministic two-way transducers realizing a function. To com¬ 
plete the picture, a natural approach would then be to consider the constructions 
from [7 and prove that aperiodicity is preserved. One could also think of apply¬ 
ing this approach to other varieties of monoids, such as the j7-trivial monoids, 
equivalent to the boolean closure of existential first-order formulas BS\[<], Un¬ 
fortunately, the closure of such transducers under composition requires some 
strong properties on varieties (at least closure under semidirect product) which 
are not satisfied by varieties less expressive than the aperiodic. Consequently the 
construction from SST to 2DFT cannot be applied. On the other hand, the other 
construction could apply, providing one inclusion. Then an interesting question 
would be to know where the corresponding fragment of logic would position. 
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Appendix 


Substitution Transition Monoid 

Theorem 3. Let T be a k-bounded SST with £ variables. If its FTM is aperiodic 
with aperiodicity index n then its STM is aperiodic with aperiodicity index at 
most n + (k + 1)£. 

Proof. Let T be a fc-bounded SST.We define a loop as the run induced by a pair 
(q,u) £ Q x A* such that < 5(q,u) = q. Suppose now that Mt is aperiodic, and 
let n be its aperiodicity index. Wlog, we assume that the transition function of 
T is complete. This implies that for all states p of T, there exists a state q such 
that p q A- q. Then if the image in STM of the loops (i.e. the set of all d 
such that there exists a loop (q,u) such that f u (q) = (<?, <r)) are aperiodic with 
index m, then the STM is aperiodic with index at most n + m. 

Consequently, in the following a denotes the substitution of a loop of T, and 
we aim to prove that cf( fc + 1 )^ = 5 , ( fc + 1 )^+ 1 . 

Before proving this though, we define the relation < C X x X as follows. 
Given two variables A' and Y. we have X < Y if there exists a positive integer 
i such that X flows into Y in a 1 . This relation is clearly transitive. The next 
lemma proves that it is also anti-symmetric, hence we can use this relation as 
an induction order to prove the result. 

Lemma 1. Given two different variables X and Y, if X < Y. then Y <jLX. 

Proof. We proceed by contradiction. Assume that there exist two different vari¬ 
ables X and Y and two integers i and j such that A' occurs in cr l {Y) and Y 
occurs in a^{X). 

Then for any k > 0, X occurs in (j k £ l+ ^ {X) and Y occurs in er fc (* +J ') +J '(A). 
As T is aperiodic of index n, for k large enough it means that both X and Y 
occur in both a n (X) and a n (Y). Then a 2n (X) contains both a n (X) and a n (Y ) 
and thus contains at least two occurrences of A' and Y. Then by aperiodicity we 
have <7 2n (A) = a n (X) thus cr n (X ) contains two occurrences of X. By iterating 
this process, we prove that the number of occurrences of X in a n {X) is not 
bounded, yielding a contradiction. 

We now prove that for all variables A in X, <=d fc+1 ^(A) = d ( ~ k+1 '> e+1 (X) by 
treating the following two cases: 

— If X (E cr(X), then either d(X) = X and then <r 2 (X) = <r(A), or there 

exists Y X such that Y e tt(A). In the latter case, we get by iteration 
that for all * > 0, |<r l (X)| > Aj < j|cr J (F)|. Then as T is fc-bounded, we have 
\d l (X)\ < k£ and thus (Y) \ is bounded, and d ke (Y) = e, which proves 

that d u (X) = d ki+1 (X). 

— If X cr(X’), let us consider the relation <. By Lemma [l] this relation is 
cycle-free. Then there is a lesser level, on which there are the variables Y 
such that <t(Y) C {T}. There, either d(Y) = 0 and aperiodicity becomes 
trivial, or cr{Y) = Y and the case was dealt with in the previous point and 


is thus aperiodic with index kl. Now we can end the proof by reasoning 
by induction on <, as if X ^ cr(A') and all variables Y < X are aperiodic 
with index i, then a l+l (X) can be written as the concatenation of a l (Y ), 
for aperiodic variables Y of index i. Then a l+l (X) = d l+2 (X). The proof is 
concluded by noticing that the length of the longest chain of < is bounded 
by l. 


From 1-bounded SST to 2DFT 

Theorem 4. Let T be a 1-bounded SST with n states and m variables. Then we 
can effectively construct a deterministic 2-way transducer that realizes the same 
function. If T is 1-bounded (resp. copyless), then the 2DFT has 0(m2 m2 n n ) 
states (resp. 0(mn n )). 

Proof. Let T = ( A , B , Q, qo,Qf,S, X, p, F) be a SST. Let us construct a two-way 
transducer A that realizes the same function. The transducer A will follow the 
output structure (see Figure [3]) of T and construct the output as it appears in 
the structure. To make the proof easier to read, we define A as the composition 
of a left-to-right sequential transducer A!, a right-to-left sequential transducer 
A" and a 2-way transducer B. Remark that this proves the result as two-way 
transducers are closed under composition with sequential ones. The transducer 
A! does a single pass on the input and enriches it with the transition used by T in 
the previous step. The second transducer uses this information, and enriches the 
input word with the set of variables corresponding to the variables that will be 
produced from this position. The last transducer is more interesting: it uses the 
enriched information to follow the output structure of T. The output structure 
of a run is a labeled and directed graph such that, for each variable X useful at a 
position j. we have two nodes X\ and X) linked by a path whose concatenated 
labels form the value stored in X at position j of the run (see pT] and Figure [3]). 
The set of variables will be used to clear the non determinism due to the 1- 
bounded property. Note that in the case of a copyless SST, the transducer A" 
can be omitted. We now explain how the several transducers behave on a given 
run r = q 0 qi ... q n . 

The transducer A! = (A , A x Q,Q x l±){/}, q 0 , a , /3, {/}), which enriches the 
input word with the transitions of the previous step, can be done easily with a 1 - 
way transducer, which first stores the transition taken in its state, then outputs 
it along the current letter read. Given a letter a and a state q , we have the tran¬ 
sitions (q, a) a ^ a,q \ S(q, a). We also have (q, H) /. Then on the run r, if 

u = ai... a n , we get the output word A'( h u H) = (ai, qo)(a,2, qi) ■ ■ ■ (a n , < 7 „_i)(H 
i Qn)- 

The transducer A" = (Ax Q, Ax Q x 2 X ,2 X l±l {i, /'}, i, a, 0, {/'}), which 
enriches each letter of the input word with the variables effectively produced from 
this step, can be done easily with a right-to-left sequential transducer, which 
starts with the variables appearing in F(q n ). Given a letter ( a,q ) and a set S , 

we have S 5 " where S' = {X e X \ BY e S X occurs in p(q, a, Y)}. 





Then given an input word (a±, <Zo)(« 2 , qi) ■ • ■ (H,?™), we define S n = F(q n ), and 
<Sj_i = {X G X | BY G F(g„) X G cr gi _ li 0 i ... a „(Y)}, we get the output word 
-4" o_4.'(b uH) = (ai, <?o, Si)(a, 2 , Qi, S 2 ) ■.. (a„, g„-i, 5 , n )(H, g„, 0 ). 

The aim of the third transducer B is to follow the output structure of T, 
which can be defined as follows: The output structure of a run is a labeled and 
directed graph such that, for each variable X useful at a position j, we have two 
nodes Xf and X£ linked by a path whose concatenated labels form the value 
stored in X at position j of the run. Formally, the output structure of a run 
qo —> Qn is the oriented graph over X x [1, |u|] x {i,o} whose edges are labeled 
by output and are of the form: 

— ((X,j,i),v, (Y,j — 1, *)) if p(qj-i,a,j,X) starts with vY. 

— ((X, j, o),v, {Y,j— 1, *)) if there exists Z such that XvY appears in p(qj_i,aj, Z), 

— ((X,j,o),v, ( Y,j + l,o)) if p(qj-i,a,j,Y) ends by Xv, 

— (X,j + l,o)) if p(qj-i_,aj,X) = v. 

We furthermore restrict to the connected component corresponding to the actual 
output of the run. 

Now let B = {A x Q x 2 X , B, P,po, p, v. {/}) be defined by: 

— P = X x {i, 0 } l±J {po, /} is the set of states. The transducer does a first left- 
to-right reading of the input in state pq. The subset X x {i,o} will then be 
used to follow the output structure while keeping track of which variable we 
are currently producing. The set {i, 0 } stands for in and out and corresponds 
to the similar notions in the output structure. Informally, in states will move 
to the left, while out states move to the right. The states po and / are new 
states that are respectively initial and final. 

The transition function p : PxAxQx2 x —> Px{ — 1,0, +1} and the production 
function v : PxAxQx2 X —» B* are detailed below. In the following, we consider 
that the transducer is in state p reading the triplet t = (a, q , S ) or one of the 
endmarkers (see Figure |5j). 

— If p = po and a 7 H, then we set p(po, t) = ( po , +1) and v(po, t) = e. 

— If p = po and a =H, then if F(q) starts by uX with u G B* and X £ X, then 

p(p, t) = ((X, i), — 1) and v{p , t) = u. 

— If p = (X, i), and t then: 

• either p(q,a)(X) = u G B* and does not contain any variable, and we 
set p{p,t ) = ((X, o),+l) and v(p,t) = u, 

• or p(q,a)(X) starts by uY with u G B* and Y e X, then p{p,t) = 

(( Y,i ), — 1 ) and v(p,t) = u. 

— If p= (X, i), and t =b then p{p, t) = ((X, o), +1) and i/(p, t ) = e. 

— If p = (X, o) and a ^H, then let Y be the unique variable of S such that X 

appears in p(q,a)(Y). Then we have: 

• either p(q, a)(Y) ends by Xu with u in B* and we set p(p, t) = ((Y, o), +1) 
and v(p, t) = u , 

• or p(q, a)(Y) is of the form ( B U X)*XuX'(B U X)* and we set p(p, t ) = 
((X', i), —1) and v(p, t) = u. 


Note that the unicity of such Y in S is due to the 1-boundedness property. 

If T is copyless, then this information is irrelevant and A" can be bypassed. 

— If p = ( X , o), q £ Qf and a =H then: 

either F(q) ends by Xu with u in B* and we set = (/,+1) and 

v{p,t) = u, 

or F(q) is of the form (. BUX)*XuX'(BL\X )* and we set fi(p, t) = ((X',i),— 1) 

and v(jp, t) = u. 

Then we can conclude the proof as T = B o A" o A! and 2-way transducers 
are closed by composition [5|. 

Regarding complexity, a careful analysis of the composition of a one-way 
transducer of size m with a two-way transducer of size n from m shows that 
this can be done by a two-way transducer of size 0(nm m ). Then given a 1- 
bounded SST with n states and m variables, we can construct a deterministic 
two-way transducer of size 0(m( 2 m ) 2 n n ) = 0(m2 m2 n n ). If T is copyless, the 
sequential right-to-left transducer can be omitted, and the resulting 2DFT is of 
size 0{mn n ). 

Theorem 5. Let T be an aperiodic 1-bounded SST. Then the equivalent 2DFT 
constructed using Theorem [^] is also aperiodic. 

Proof. We prove separately the aperiodicity of the three transducers. Then the 
result comes from the fact that aperiodicity is preserved by composition of a 
one-way by a two-way [3]. 

First, consider the transducer A!. It is a one-way transducer that simply 
enriches the input word with transitions from T, each enrichment corresponding 
to the transition taken by T in the previous step. Then since T is aperiodic, so 
is its underlying automaton. Then the enrichment and thus A! are aperiodic. 

Secondly, given an input word, the transducer A" stores at each position the 
set of variables that will be output by T. Now as T is aperiodic, the flow of 
variable is aperiodic. Thus the value taken by this set is aperiodic and so is A". 

Now, consider the transducer B and a run r of B over u n starting in state p. 
Note that the fact that there exists a run over an enriched input word v implies 
that it is well founded, meaning that it is the image of some word of A* by 
A" o A' . If p is of the form (X, i), then the run starts from the right of u n and 
follows the substitution oy(X). It exits u n either in state (X,o) on the right 
if oy(X) is a word of B*, or in a state (Y,i) on the left where Y is the first 
variable appearing in oy(X). In both cases the state at the end of the run only 
depends on the underlying automata of T and the order of variables appearing 
in the substitution induced by the run. Since the substitution transition monoid 
is aperiodic if T is aperiodic by Theorem (3J and then a similar run exists over 
u n+1 . 

Finally, if p is of the form (X, o), then the state in which the run exits u n 
depends on the unique variable Y such that X belongs to o r (Y) and Y belongs 
to the set of variables of the last letter of the input. Then the run follows the 
substitution a r (Y ). It will exit the input word in state (X 1 , i ) on the left if XX' 
appears in d r (Y) for some variable X' and in state (Y, o) otherwise. As the flow 


of variable as well as the underlying automaton are aperiodic, a similar run exists 
over u n+1 . 

We conclude the proof by noticing that the same arguments will hold to 
reduce runs over u n+1 to runs over u n . 


From 2DFT to copyless SST 

Theorem 6 . Let T be a 2DFT with n states. Then we can effectively construct 
a copyless SST with 0((2n) 2n ) states and 2n — 1 variables that computes the 
same function. 

Let T = ( A,B,Q,i,S,"/,F) be a 2DFT. Let us suppose that the transducer 
T starts to read its input from the end, and not from the beginning, i.e., given 
an input w , the initial configuration is (qo, |u>| + 1). Moreover, let us suppose 
that for any transition (p, \-,q,m) of T, y(p, h, q, m) = e. Note that any 2DFT 
can be transformed with ease into a transducer satisfying those two properties. 

In order to reproduce the behavior of T with an SST T', we need to keep 
track of the right-to-right runs of T. Moreover, as we want T’ to be copy less, it is 
not possible to store the production of a right-to-right run into a single variable, 
since two different runs might share a common suffix, and require copy in order 
to update the corresponding variables. This leads us to modelize the right-to- 
right runs of T with rooted forests whose vertices are included into 2^ \ 0. The 
idea is that, given two states < 71,172 G Q , a merging between the right-to-right 
runs starting from <71 and <72 can be represented by adding an edge from both 
{< 71 } and {< 72 } towards {< 71 , < 72 }- Formally, we use the set Tq of rooted forests 
G = ( V,E ) such that V is a subset of 2^ \ 0, and the following properties are 
satisfied. 

— The roots of G are disjoint subsets of Q. 

— For every vertex s, the sons of s are disjoint proper subsets of s. 

Note that two graphs of Tq with same set of vertices are equal, hence each 
element of Tq is uniquely defined by its set of vertices. In order to also keep 
track of the target states of the right-to-right runs, the states of T' are pairs 
(G, <f>), where G £ Tq, and <f> is an injective function mapping each tree of G, 
corresponding to a set of merging runs, to an element of Q, corresponding to the 
target state of those runs. In the following definitions, for every vertex v of G 
we will usually denote by <p(s) the state 4>(T S ), where T s is the tree containing 
the vertex s. 

For every word w over the alphabet A = dU{h,H}, we now expose an 
inductive construction of a pair ( 1 G w ,(f > w ), where G w £ Tq and <j> w maps the 
roots of G w to Q, that contains all the information concerning the right-to-right 
runs of T over the input w, and their mergings. Formally, for every state q £ Q 
that appears at least in one vertex of G w , let s q £ 2® be the vertex of G w 
of minimal size such that q £ s q . Then we want the following property to be 
satisfied. 



Pi For all (p, q) £ bh rr .(w), we have (j> w (s p ) = q. 

Let G e £ JFq be the graph on 0 vertex, and let (j) e : V —> Q be the empty 
function. Note that Pi is trivially satisfied for w = e, as bli rr (e) is empty, by 
definition. Now, let w £ A*, let a £ A, and suppose that (G w ,<f> w ) is defined 
such that Pi is satisfied for w. Then ( G wa ,<j> wa ) is built based on (G w ,<j> w ) in 
three steps. First, we build a graph G' wa by adding to G w edges corresponding 
to the function <j> w . Second, we build a graph G" a by adding to G' wa the local 
transitions induced by the letter a. Finally, we reduce G " 0 into an element 
G wa — (v wa 5 EyjCL ) of JFq. 

- Let G 'wa = (K.Q> E wa ) be the graph defined by V' wa = V v: U Q i , where Q 1 is 
a copy of the set Q, and E' wa = E w U {(0“ 1 (p),p 31 ) £ 2^ x Q*\p £ Im(^ w )}. 
Since Pi is satisfied for w by supposition, for every (p, q) £ bh rr (u;) there is 
a path in G' wa between s p and qb 

- Let G" a = OC l ,E'J aa ) be the graph defined by V” a = V^ a U Q°, where Q° 
is a copy of the set Q, and let E'^ a = E' wa U {(p 1 , r(p)) £ Q 4 x V^ a \p £ Q}, 
where 


f q° £ Q° if S(p,a) = (q,+l), 

T (p) = ^ q 1 £ Q iL if S(p, a) = (q, 0), 

[ s q £ V w if 6(p,a) = (q, —1) and s q is the smallest vertex containing q. 

For every (p, q) £ bh rr (u;a), there exists a path in G' wa between p 1 and q°. 

— Let in(s) : V" m —> 2® be the function mapping each vertex s of G" a to the 
set of states q such that there exists a path from q 1 to s in G" Q . Moreover, let 
out(s) : V” a —> Q U {_L} be the function mapping each vertex s of G" Q , to _L 
if the unique path starting from s loops infinitely, and to q £ Q if the target 
of this path is the vertex q 0 £ Q°. Let G wa = ( V wa ,E wa ), where V wa = 
(in(s) C Q\s £ V r J Q ,out(s) 7 ^ _L}, and let <j> wa be the function mapping each 
vertex in(s) of V wa to out(s). For every (p, q) £ bh rr (w;a), since out(p 1 ) = q, 
in(p 11 ) is a vertex of G wa , and 4> W a('^{p^)) = out(p 11 ) = q, hence Pi is satisfied, 
as in(p 0 ) = s p . 

Remark 1. Since the construction of G wa only depends on G w and a, for every 
w' £ A* such that G w > = G w , we have G w ' a = G wa - 

Remark 2. By construction, for each subset s of Q, s is a vertex of the graph 
G w if and only the subset of right-to-right runs of T over w containing the runs 
whose starting state belongs to s merge at some point, before merging with any 
other. 

The next results will allow us to obtain the bound over the size of the set of 
states presented in the statement of Theorem [G] 


Lemma 2. Every rooted forest G = {V, E) in J-q has at most 2\Q\ — 1 vertices. 


Proof. This is proved by induction over \Q\. If \Q\ = 1, |2 < 3\0| = 1, hence 
\V\ < 1 = 2|<2| — 1. Now suppose that \Q\ > 1, and that the the result is 
true for every set Q' such that |Q'| < \Q\. Let G' be the graph obtained by 
removing all the roots of G. Then G' is a union of trees Gi = (Vi,£i),G 2 = 
(V 2 ,E 2 ),..., G m = ( V m , E m ). For every 1 < i ^ m, the forest G,; is an element 
of T Si , where s* C Q denotes the root of Gt. Therefore, by using the induction 
hypothesis, we have 

|V| = l + |Vi| + |V 2 | + .-. + |V m | 

^ 1 + (21si| — 1) + (2|s 2 | — 1) + ... + (2|s m | — 1) 

= 2(|si| + ... + |s m |) + 1 — TO 
<2 |Q|-1 . 

(<2 \2n — 2 

Lemma 3. Let n = \Q\. The size of Tq is smaller than or equal to ■ 

Proof. By Cayley’s Formula, there exists exactly (2n) 2 ” -2 labeled trees on 2 n 
vertices. The result follows from the fact that any element of Tq can be repre¬ 
sented by a tree on 2 n vertices, of which at most n + 2 are labeled, which justifies 
the denominator, as the labels of (n — 2) vertices can be forgotten. 

Given an element G = (V, E) of Tq , we know that \V\ ^ 2n — 1 by Lemma 
[2] Let G' be the tree on 2 n vertices obtained by adding to G a vertex sx, an 
edge from each root of G to si, and, if \V\ < 2n — 1, a linear path composed of 
2n — \V\ — 1 new vertices starting from a vertex sr, and whose end is linked to 
sx- Then G can be computed back from any graph isomorphic to G\ as long as 
we set the label T and T to the vertices corresponding to sx and St, and the 
label q to the vertex corresponding to s q , where s q is the smallest vertex of V 
containing q , for every q appearing in a vertex of G. 

Corollary 3. Let n = \Q\. The size of {(G w , (f> w )\w £ A*} is smaller than or 
equal to (2 n) 2n . 

Proof. For every w £ A*, since <f> w is an injective function mapping the trees of 
G w to Q , and G w contains at most n trees, 

\{{G w ,cf w )\w G A*}\ < u\\Tq\ < (2 n) 2n 

An other consequence of Lemma [2] is that for every word w G A *, the graph 
G w = (V W ,E W ) admits an injective vertex labeling : V w —» X, where X = 
(Xi,..., X 2 ^q|_!) is a set containing 2\Q\ — 1 variables. We shall now present for 
every w G A*, the construction of a substitution a w G Sx.b that will allow us, 
together with the graph G w and its vertex labeling A^, to describe the output 
production of the right-to-right runs of T over w. Formally, for every ( p , q) G 
bh rl .('ui), let Wp^q G B* denote the production of the corresponding right-to-right 
run. Moreover, for every vertex s of G w , let Au,(s) denote the concatenation of 
the A-labels of the vertices forming the path starting from the s in G w , and for 
every state q G Q that appears at least in one vertex of G w , let s q G 2 Q be 
the vertex of G w of minimal size such that q G s q . Then we want the following 
property to be satisfied. 



P 2 For all (p,q) £ bh rr (u;), we have (a w )(A w (s p )) = w Piq . 

Let a e be the substitution mapping each variable to e. Once again, since 
bh rr (w) is empty, P 2 is trivially satisfied for w = e. Let w £ A*, let a £ A, and 
let us suppose that P 2 is satisfied for w. The substitution <r wa is defined as the 
composition of a substitution a w ^ a , whose construction we will now present, with 
a w . In order to build a w , a , we define a vertex labeling /n wa : V wa —>• (X U B)*. 

We require p wa to be copyless, i.e., for any variable X £ X, there exists at most 
one vertex s such that X occurs in fi wa (s). Then, we define a Wja as the copyless 
substitution mapping A wa (s) to /J, wa (s ). The labeling fi wa is obtained by first 
extending the labeling A w of V w to a labeling /i" a of V£ a , and then reducing it 
to G wa . We denote by Jf wa {s) (resp. /2" a (s)) the concatenation of the labels of 
the vertices forming the path starting from a vertex s of G wa (resp. G" a ). 

— Let /z" a : V” a —> X U B* be the substitution mapping s £ V w to A lu (s), 
q° £ Q° to e, and q 31 £ Q 31 to j(p, a, q, m) £ B*, where S(p, a) = (q, m). Since 
A w is injective, this labeling is copyless. Moreover, since P 2 is satisfied for w 
by supposition, by definition of G" a we have, for every (p,q) £ bli rr (u>a), 

a w{P-wa(P )) = w P,q- 

— For every t £ V wa , the set of vertices s of G" Q such that in(s) = t is not 
empty, and they form a path s 1 , ■. ■, s m . Let /i(s) = [if(s i)... /i'(s m ). Since 
^" a is copyless, so is p wa . Moreover, 

) (Pwa (I a (p ' )) ) — W p ,q. 

Since a wa = cr w °cr w , a , cr w , a (A wa (s)) = n wa (s) by definition, and ir^p 31 ) = s p , 
this proves that P 2 is satisfied for wa. 

Remark 3. Since the construction of cr vj a only depends on G w and a, for every 
w' £ A* such that G w ' = G w , we have a w > ta = a Wta . 

We are now ready to define formally the copyless SST T' = (A, B, P, j, Qf , a, X, /3, F'). 

P = {(Ghiii, <f>\-w)\w £ Al*}, 

~ j = (Gh, <k-)i 

Qf — { {G\—uj , I^HuH (^) £ -F}, 

— a : P x A ->■ P, {{G\- W ,(j)\- W ),a) {G\- wa ,(j)\- wa ), which is well-defined, by 

Remark [lj 

— ^ = {X i |l<*<2|Q|-l}, 

Sx,b , ((G\- W ,<j>\- W ),a) >->• <T\-w,a, which is well-defined, by Re¬ 
mark [3l 

— F' : Qf —X ( X U B)*, (Gh w , t 0 Vuh(<^i-uh(*))- 

The state reached by T' over an input word w £ A* is G\- w , and the sub¬ 
stitution induced by the corresponding run is cr\- w . Therefore, by Pi and the 
definition of Qf, the domain of the functions defined by T and T' are identical. 
Moreover, since ay is the substitution mapping all the variables to e, by suppo¬ 
sition, we have, by P 2 and the definition of F' , that the image of a given word 
by those two functions are also identical, hence the functions are the same. 


Theorem 7. Let T be an aperiodic 2DFT. Then the equivalent SST constructed 
using Theorem^ is also aperiodic. 

To prove this theorem, we introduce a new equivalence relation, and prove 
that the aperiodicity of the 2DFT implies aperiodicity of this relation. The ape- 
riodicity of the SST then follows from this. We say that two words v and w are 
merge equivalent (v w) if they induce the same merges in the same order in 

their four behavior relations. Let us remark that if v w, then for any word 
it, G uv = G uw . This is due to the fact that G uv represents the merges of the 
right-to-right runs over uv, and these runs can be decomposed in right-to right 
runs over u and partial runs over v. 

Lemma 4. Let w £ A* be such that w n w n+1 . Then w n w n+1 . 

Proof. Let w and n be such that w n u) n+1 . By definition, they model the 
same partial runs, and thus the same merges appear. Thus we only need to 
prove that they appear in the same order. Consider two merges that appear 
consecutively in w n . We prove that they appear in the same order in w n+1 , 
depending on the kind of partial run that we consider. If the merges are of right- 
to-right or left-to-left runs, then the exact same runs appear in ie™ +1 and thus 
the same merges appear. If they affect right-to-left (resp. left-to-right) runs, then 
the rightmost (resp. leftmost) n iterations of w in w n+1 will merge these run. 
By noticing that after merging, two runs can not be separated again, the two 
merges will appear in the same order in w n+1 , concluding the proof. 

The two following lemmas now conclude the proof, since the aperiodicity of 
both the underlying automaton and the substitution imply the aperiodicity of 
the whole SST. 

Lemma 5. Let w £ A* be such that w n w n+1 . Then for every u £ A*, 

G uw ”- + 1 — G uw n . 

Proof. This lemma comes directly from Lemma [I] and the previous remark stat¬ 
ing that the merge equivalence implies the equivalence for the underlying au¬ 
tomaton of the SST. 

Lemma 6. Let ui £ A* be such that w n w n+1 . Then for every u £ A*, 

&u,w n ^ &u,w rLJrl • 

Proof. By construction, reducing a graph from the proof of Theorem [6] G' ua to 
G ua amounts to delete the unnecessary information from G u , i.e. deleting cycles 
and reducing paths with no new merges to a single vertex. Thus each vertex 
from G u can either be traced back to a vertex of G ua or is deleted. Should we 
forget about the production, the flow of variables then corresponds exactly to 
this. Thanks to Lemma Ql we know that w n and w n+1 are merge equivalent. 
Then given a state G u and one of its vertex, it can be traced back to the same 
vertex after reading w n or w n+1 . Since we have a unique way of mapping vertices 
to variables, the substitutions a uw n and cr uw n+i will be equal when production 
is erased, proving the aperiodicity of the substitution function. 


From fc-bounded to 1-bounded SST 


Theorem 8. Given a k-bounded SST T with n states and m variables, we can 
effectively construct an equivalent 1-bounded SST. This new SST has n2 N states 
andmkN variables, where N = 0(n n (k + l) nm ) is the size of the flow transition 
monoid Air- 

Proof. In order to move from a fc-bounded SST to a 1-bounded SST, the natural 
idea is to use copies of each variable. However, we cannot maintain fc copies of 
each variable all the time: suppose that X flows into Y and Z , which both occur 
in the final output. If we have fc copies of X , we cannot produce in a 1-bounded 
way fc copies of Y and fc copies of Z. We will thus limit, for each variable X, the 
number of copies of X we maintain. In order to get this information, we will use 
a look-ahead information on the suffix of the run. 

The proof relies on the following fact: suppose that we know at each step 
what is the substitution induced by the suffix of the run. From this substitution, 
for each variable X we know the value of the integer n such that X will be 
involved exactly n times in the final output. We can thus copy each variable 
sufficiently many times and use them to produce this substitution in a copyless 
fashion. 

One can observe that there are finitely many substitutions, this information 
being held in the transition monoid of the SST. Then we can compute, at each 
step and for each possible substitution, a copyless update. But as a given element 
of the monoid may have several successors, the update function flows variables 
from one element to variables of several elements. As these variables are never 
recombined, we get the 1 -boundedness of the construction. 

Let T = (A, B , Q, q 0 ,Qf,S, X , p, F) be an aperiodic fc-bounded SST, Mr be 
its transition monoid, and ijr ■ A* —» Mr be its transition morphism. 

We construct T' = (A, B, Q ', q' 0 , Q'j, S', X ', p', F') where: 

— The set of states Q' = Q x V(Mr) is the current state plus a set of elements 
of Mr corresponding to the possible images of the current suffix. 

— q' Q = (qo,So) where So = {to £ Mr \ S(qo,m) £ Qf} is the set of relevant 
possible images of input words. Here, we are abusing notations as 5(q,m) 
stands for S(q,u) where r/riu) = to. By definition of the transition monoid, 
we have that pr{u) = pr{v) implies S(q, u) = 5(q, v), thus this is well defined. 

— Q’f = {(<b S) | q £ F and \ Mt & S}. 

— S' : Q' x A —x Q' is defined by S((q,S),a) = (q r , S') where q' = S(q,a) and 
S' = {771 £ Mr | 7~ir(a)m € S}. 

— f' = fx Mr x {1,..., fc}. Variables from X' will be denoted X™ for X £ 
X,i < fc and to £ Mr- 

— The variable update function is defined as follows. First given a state q of 
T and an element m of Mr, we define (J 9 , m as the projection of the output 
substitution induced by a run starting on q on a word whose image is to, 
be &q lU = 7 (q,u) o F(5(q,u)) for rjr(u) = in. Note that by definition of the 
transition monoid of an SST, it is well defined. 



Now consider a transition (q,S) A (q',S'), n £ S' and 0 < * ^ \crq',n\x, 
p'((q,S),a,X™) is defined similarly to p(q,a,X ), where all variables are la¬ 
beled by the element rjT(a)n and numbered to ensure the 1-bounded prop¬ 
erty. Such a numbering is possible thanks to the fact that n indicates which 
variables are used as well as how many times. This allows us to copy each 
variables the right amount of times, using different copies at each occurrence. 
The fc-bounded property then ensures that we will never need more than k 
variables for a possible output. 

— F 1 : Q'j —> (B U X')* is defined as follows. Let (q, S) € Q'f. The string 
F'{q, S) is obtained from the string F(q) by substituting each variable X by 
a variable X", where 0 < * < \F(q)\x and n = 1 m t - 

Theorem 9. Let T be an aperiodic k-bounded SST. Then the equivalent 1- 
bounded SST constructed using Theorem 0 is also aperiodic. 

Proof. We now have to prove that T' is aperiodic. We claim that the runs of T' 
are of the form (q, S) A (q r , S') where q A-y q' and S' = {n £ Mt \ pr{u)n G 
S}, which holds by construction of T’. The update for such a run is then the 
update of T over the run q A-r q' , where variables are labeled by elements from 
S and S' and numbered accordingly. Then as T is aperiodic, the Q part of the run 
is also aperiodic by construction. The other part computes sets of runs according 
to Mt, which is also aperiodic. Then T' will also be aperiodic as the set S' only 
depends on the image of the word read, and by definition rjT(u n ) = ? 7 T(w n+1 ) 
for n large enough. 


